| FEMALE (N=383) |
MALE (N=232) |
Overall (N=615) |
|
|---|---|---|---|
| Age | |||
| 18-44 | 13 (3.4%) | 13 (5.6%) | 26 (4.2%) |
| 45-59 | 65 (17.0%) | 24 (10.3%) | 89 (14.5%) |
| 60-69 | 107 (27.9%) | 64 (27.6%) | 171 (27.8%) |
| 70-79 | 155 (40.5%) | 92 (39.7%) | 247 (40.2%) |
| 80+ | 42 (11.0%) | 38 (16.4%) | 80 (13.0%) |
| Unknown | 1 (0.3%) | 1 (0.4%) | 2 (0.3%) |
| Race | |||
| American Indian or Alaska Native | 2 (0.5%) | 2 (0.9%) | 4 (0.7%) |
| Asian | 119 (31.1%) | 71 (30.6%) | 190 (30.9%) |
| Black or African American | 10 (2.6%) | 7 (3.0%) | 17 (2.8%) |
| Native Hawaiian or Other Pacific Islander | 2 (0.5%) | 2 (0.9%) | 4 (0.7%) |
| Unknown | 35 (9.1%) | 27 (11.6%) | 62 (10.1%) |
| White | 215 (56.1%) | 123 (53.0%) | 338 (55.0%) |
| Ethnicity | |||
| Hispanic or Latino | 35 (9.1%) | 14 (6.0%) | 49 (8.0%) |
| Not Hispanic or Latino | 337 (88.0%) | 210 (90.5%) | 547 (88.9%) |
| Unknown | 11 (2.9%) | 8 (3.4%) | 19 (3.1%) |
AI for Automatic Synoptic Reporting
CAP forms Description
CAP (College of American Pathologists) forms are standardized cancer reporting protocols that have revolutionized pathology practice by replacing inconsistent narrative reports with structured, synoptic formats containing essential diagnostic and prognostic information. Developed over 35 years ago to address significant variability in cancer reporting, these evidence-based protocols ensure complete, uniform documentation of malignant tumors across all healthcare institutions, directly improving patient outcomes and clinical decision-making.
Lung resection CAP forms are particularly critical in thoracic oncology, providing standardized reporting templates for primary lung cancers that include essential elements such as tumor size, histologic type and grade, surgical margins, lymph node status, and staging classifications. These lung-specific protocols have demonstrated measurable clinical impact, with studies showing that synoptic reporting achieves 88.4% completeness compared to only 2.6% for traditional descriptive reports, leading to more accurate staging, better treatment planning, and improved survival rates.
By establishing consistent terminology and data capture requirements, CAP lung resection forms enhance communication between pathologists and oncologists, ensure regulatory compliance with Commission on Cancer standards, and provide the structured data foundation necessary for personalized cancer care, targeted therapy selection, and multidisciplinary treatment coordination. The widespread adoption of these standardized protocols, supported by electronic integration into laboratory information systems, has positioned pathologists as key members of the lung cancer care team while enabling seamless data exchange for cancer registries, research, and quality improvement initiatives.
CAP forms Dataset Description
At Stanford, CAP forms were implemented within Epic using SmartForms. SmartForms is an Epic product that allows the capture of semi-structured information within the EHR that can later be used to generate free-form reports. In this case, the structured information from the CAP forms is captured on SmartForms, which later generate the synoptic reporting section within the pathology report. As a consequence, for relevant cases, the pathology report will contain this section.
For this task, we aim to use AI to populate the CAP forms automatically by using all the other elements from the pathology report. We formulated this as a question-answering problem, where the context for each question is the entire pathology report, the question is the particular CAP form element, and the answer is the value/selection to be populated.
For this initial experiment, we collected a dataset with 609 patients and 615 lung resection forms. The forms were reported between November 2022 and March 2025. The table below summarizes the demographics for this dataset. It is important to notice that this a silver standard dataset, since no actual annotations from Pathologist were made to confirm that the rest of the pathology report contained sufficient information to populate the associated CAP form.
Methods and Results
For this experiment, we used several state-of-the-art LLMs to assess their capabilities to extract the information required for lung resection CAP forms. As mentioned before, we formulated the problem as a question-answering problem. The full set of top level questions for the Lung Resection CAp forms alongside the descriptions for each one of them can be found here. Each LLM was tested in a zero-shot setting, where no actual examples were given. Each LLM was asked to answer a single question using the entire pathology report (excluding the synoptic report) and instructions that contained the most recent lung resection CAP forms instructions. We asked the LLMs to provide the answers using JSON to facilitate the parsing of the actual answers. The prompt used can be found here
To automatically evaluate the output, we used the traditional BERT score, which evaluates the semantic similarity between the generated answer and the reference answer. The results of this evaluation are shown in the figure below.